Maps of Uncertinaty
The techniques in this section allow us to create maps of where our uncertinaty may be higher or lower. The most straight forward is a map that shows the standard deviation of our output results after a series of MC runs. The approach is to:
- Create a loop that creates a model and a model prediction with a random component. The random component could be cross-validation, noise injection, sensitivity testing, or some combination of all of these.
- On each run, save the predicted values into a vector (technically it will be a vector of vectors)
- After the runs are completed, compute the standard deviation of the predicted.
- Use the standard deviation of the predicted values to create the map as you would for any other prediction values.
Note: are we modeling the variation in the outputs or in the residuals?
Basic Method
The solution below will work for creating uncertianty maps in most situations until we use cross-validation and are using the original data for the prediction. The next section shows an approach for cross validation.
This approach uses a matrix to capture all the predicted values. Each time a model run is completed, the predicted values are added as a column to the matrix. After all runs are finished, the standard deviation of each row can be computed to create a vector that can be added back into a dataframe and then used to create an uncertianty map. This provides a map of the standard deviation of the predictors.
The first step is to include the 'matrixStats' library which includes a number of functions for matrixes including one to compute the standard deviation of the rows in a matrix.
# matrixStats includes a function to compute the std dev of a row require(matrixStats)
Next, create an empty matrix that has the same number of rows as the dataframe
# create an empty matrix PredictionMatrix=matrix(, nrow = NumRowsInDataFrame, ncol = 0)
Then, inside your loop, compute the residuals vector for all rows in your dataframe and add it to your residuals matrix.
# add the predicted values for this run as a column to the matrix PredictionMatrix=cbind(PredictionMatrix,PredictedValues)
Finally, after the loop, compute the standard deviations for the rows in the matrix. This will result in a vector that can be added to the dataframe and then used to create a map. Note that matrixStats also includes functions for computing means and other matrix values that can be used to evalute models.
# get the standard deviation for each row in the predicted values matrix PredictionStdDevs=rowSds(PredictionMatrix) # get the mean for each row in the predicted values matrix PredictionMeans=rowMeans(PredictionMatrix)
Creating Uncertianty Maps When Performing Cross-Validation and Predicting with the Original Data Set
There are times when we will have a continous response varaible and we are preidcting values on the same data set that we used to create the mdoel. These are rare but would include when we have a biomass or land cover data set, created and model of biomass, and then ran it in future senarios.
The approach below uses k-fold cross-validation to find the residuals and then create an uncertinaty map.
####################################################################### # code to perform k-fold cross validation to create an uncertianty map RandomCount=1 TheData=data.frame(X1s,X2s) TheData=data.frame(TheData,Measures) TheData$ID=seq(1:nrow(TheData)) # add a column with ids for the original rows TheData2 <- TheData[sample(nrow(TheData)),] # shuffle the data TheFolds <- cut(seq(1,nrow(TheData2)),breaks=10,labels=FALSE) # divide the data into 10 buckets ResidualsMatrix=matrix(, nrow = nrow(TheData2), ncol = 1) # create an empty matrix for the residuals FoldCount=1 while (FoldCount<=10) { # split the data into test and training testIndexes <- which(TheFolds==FoldCount) # get the indexes to the next fold in the data testData <- TheData2[testIndexes, ] # used the testindexes to sub-sample the data into a test dataset trainData <- TheData2[-testIndexes, ] # use the training indexes to sub-sample the data into a training dataset # Create a GAM based on the training data attach(trainData) TheModel=gam(Measures~s(X1s),data=trainData,gamma=1.4) detach("trainData") # create the preidction using the test data attach(testData) TestPrediction=predict.gam(TheModel,testData,type='response') detach("testData") # compute the residuals Residuals=TestPrediction=testData$Measures # get the original indexes for the test data to update the residuals OriginalIndexes=TheData2$ID[testIndexes] # go through the residuals adding them into the residuals matrix i=1 NumRows=length(Residuals) while (i<=NumRows) { OriginalIndex=OriginalIndexes[i] # get the original index for this data point Value=Residuals[i] # get the Residual value #Value=unname(Value) # R adds a name to the predicted value in this case we have to remove it ResidualsMatrix[OriginalIndex,RandomCount]=Value # set the value into the residuals matrix i=i+1 } FoldCount=FoldCount+1 } ResidualsStdDevs=rowSds(ResidualsMatrix) # get the standard deviation for each row in the redisduals matrix TheData2$ResSD=ResidualsStdDevs